扬声器验证(SV)为访问控制提供数十亿个支持语音的设备,并确保语音驱动技术的安全性。作为一种生物识别技术,SV有必要公正,无论其人口,社会和经济属性如何,在演讲者之间保持一致和可靠的表现。当前的SV评估实践不足以评估偏见:它们过度简化和汇总用户,不代表现实生活中的情况,并且不考虑错误的后果。本文提出了用于构建解决这些短暂事件的SV评估数据集的设计指南。我们提出了一个用于分级话语对的难度的模式,并提出了一种用于生成包容性SV数据集的算法。我们在Voxceleb1数据集上的一组实验中验证了我们提出的方法。我们的结果证实了话语对/扬声器的计数,以及语音对的难度对评估性能和可变性具有重大影响。我们的工作有助于发展包容性和公平的SV评估实践。
translated by 谷歌翻译
自动说话者识别使用数据处理来通过声音来识别说话者。如今,自动化发言人的认可已在数十亿个智能设备和呼叫中心等服务中部署。尽管在面部识别和自然语言处理等相关领域中它们的范围广泛的部署和已知偏见来源,但自动说话者识别的偏见尚未被系统地研究。我们介绍了机器学习开发工作流程中的偏见的深入经验和分析研究,这是自动说话者识别的语音生物特征和核心任务。利用一个既定的框架来理解机器学习中的伤害来源,我们表明在著名的Voxceleb说话者识别挑战中的每个开发阶段都存在偏见,包括数据生成,模型构建和实施。受影响的大多数是女性演讲者和非美国国籍,他们经历了重大的绩效退化。利用我们的发现中的见解,我们提出了减轻自动说话者识别偏见的实用建议,并概述了未来的研究指示。
translated by 谷歌翻译
Causal chain reasoning (CCR) is an essential ability for many decision-making AI systems, which requires the model to build reliable causal chains by connecting causal pairs. However, CCR suffers from two main transitive problems: threshold effect and scene drift. In other words, the causal pairs to be spliced may have a conflicting threshold boundary or scenario. To address these issues, we propose a novel Reliable Causal chain reasoning framework~(ReCo), which introduces exogenous variables to represent the threshold and scene factors of each causal pair within the causal chain, and estimates the threshold and scene contradictions across exogenous variables via structural causal recurrent neural networks~(SRNN). Experiments show that ReCo outperforms a series of strong baselines on both Chinese and English CCR datasets. Moreover, by injecting reliable causal chain knowledge distilled by ReCo, BERT can achieve better performances on four downstream causal-related tasks than BERT models enhanced by other kinds of knowledge.
translated by 谷歌翻译
Multimodal image-text models have shown remarkable performance in the past few years. However, evaluating their robustness against distribution shifts is crucial before adopting them in real-world applications. In this paper, we investigate the robustness of 9 popular open-sourced image-text models under common perturbations on five tasks (image-text retrieval, visual reasoning, visual entailment, image captioning, and text-to-image generation). In particular, we propose several new multimodal robustness benchmarks by applying 17 image perturbation and 16 text perturbation techniques on top of existing datasets. We observe that multimodal models are not robust to image and text perturbations, especially to image perturbations. Among the tested perturbation methods, character-level perturbations constitute the most severe distribution shift for text, and zoom blur is the most severe shift for image data. We also introduce two new robustness metrics (MMI and MOR) for proper evaluations of multimodal models. We hope our extensive study sheds light on new directions for the development of robust multimodal models.
translated by 谷歌翻译
Datacenter operators ensure fair and regular server maintenance by using automated processes to schedule maintenance jobs to complete within a strict time budget. Automating this scheduling problem is challenging because maintenance job duration varies based on both job type and hardware. While it is tempting to use prior machine learning techniques for predicting job duration, we find that the structure of the maintenance job scheduling problem creates a unique challenge. In particular, we show that prior machine learning methods that produce the lowest error predictions do not produce the best scheduling outcomes due to asymmetric costs. Specifically, underpredicting maintenance job duration has results in more servers being taken offline and longer server downtime than overpredicting maintenance job duration. The system cost of underprediction is much larger than that of overprediction. We present Acela, a machine learning system for predicting maintenance job duration, which uses quantile regression to bias duration predictions toward overprediction. We integrate Acela into a maintenance job scheduler and evaluate it on datasets from large-scale, production datacenters. Compared to machine learning based predictors from prior work, Acela reduces the number of servers that are taken offline by 1.87-4.28X, and reduces the server offline time by 1.40-2.80X.
translated by 谷歌翻译
The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.
translated by 谷歌翻译
The performance of a camera network monitoring a set of targets depends crucially on the configuration of the cameras. In this paper, we investigate the reconfiguration strategy for the parameterized camera network model, with which the sensing qualities of the multiple targets can be optimized globally and simultaneously. We first propose to use the number of pixels occupied by a unit-length object in image as a metric of the sensing quality of the object, which is determined by the parameters of the camera, such as intrinsic, extrinsic, and distortional coefficients. Then, we form a single quantity that measures the sensing quality of the targets by the camera network. This quantity further serves as the objective function of our optimization problem to obtain the optimal camera configuration. We verify the effectiveness of our approach through extensive simulations and experiments, and the results reveal its improved performance on the AprilTag detection tasks. Codes and related utilities for this work are open-sourced and available at https://github.com/sszxc/MultiCam-Simulation.
translated by 谷歌翻译
Designing safety-critical control for robotic manipulators is challenging, especially in a cluttered environment. First, the actual trajectory of a manipulator might deviate from the planned one due to the complex collision environments and non-trivial dynamics, leading to collision; Second, the feasible space for the manipulator is hard to obtain since the explicit distance functions between collision meshes are unknown. By analyzing the relationship between the safe set and the controlled invariant set, this paper proposes a data-driven control barrier function (CBF) construction method, which extracts CBF from distance samples. Specifically, the CBF guarantees the controlled invariant property for considering the system dynamics. The data-driven method samples the distance function and determines the safe set. Then, the CBF is synthesized based on the safe set by a scenario-based sum of square (SOS) program. Unlike most existing linearization based approaches, our method reserves the volume of the feasible space for planning without approximation, which helps find a solution in a cluttered environment. The control law is obtained by solving a CBF-based quadratic program in real time, which works as a safe filter for the desired planning-based controller. Moreover, our method guarantees safety with the proven probabilistic result. Our method is validated on a 7-DOF manipulator in both real and virtual cluttered environments. The experiments show that the manipulator is able to execute tasks where the clearance between obstacles is in millimeters.
translated by 谷歌翻译
Transformer-based language models have become the standard approach to solving natural language processing tasks. However, industry adoption usually requires the maximum throughput to comply with certain latency constraints that prevents Transformer models from being used in production. To address this gap, model compression techniques such as quantization and pruning may be used to improve inference efficiency. However, these compression techniques require specialized software to apply and deploy at scale. In this work, we propose a new pipeline for creating and running Fast Transformer models on CPUs, utilizing hardware-aware pruning, knowledge distillation, quantization, and our own Transformer inference runtime engine with optimized kernels for sparse and quantized operators. We demonstrate the efficiency of our pipeline by creating a Fast DistilBERT model showing minimal accuracy loss on the question-answering SQuADv1.1 benchmark, and throughput results under typical production constraints and environments. Our results outperform existing state-of-the-art Neural Magic's DeepSparse runtime performance by up to 50% and up to 4.1x performance speedup over ONNX Runtime. Source code is publicly available at https://github.com/intel/intel-extension-for-transformers.
translated by 谷歌翻译
神经辐射场(NERF)已成功用于场景表示。最近的工作还使用基于NERF的环境表示形式开发了机器人导航和操纵系统。由于对象定位是许多机器人应用的基础,因此进一步释放了机器人系统中NERF的潜力,我们研究了NERF场景中的对象定位。我们提出了一个基于变压器的框架NERF-LOC,以在NERF场景中提取3D边界对象框。 Nerf-Loc将预先训练的NERF模型和相机视图作为输入,并产生标记为3D边界对象的框作为输出。具体来说,我们设计了一对平行的变压器编码器分支,即粗流和细流,以编码目标对象的上下文和详细信息。然后将编码的功能与注意层融合在一起,以减轻准确对象定位的歧义。我们已经将我们的方法与基于传统变压器的方法进行了比较,我们的方法可以实现更好的性能。此外,我们还提出了第一个基于NERF样品的对象定位基准Nerflocbench。
translated by 谷歌翻译